Methods in Ecology and Evolution — Latest Matching Preprints

1

aniSNA : An R package to assess bias and uncertainty in social networks obtained from animals sampled via direct observations or satellite telemetry

Kaur, P.; Ciuti, S.; Reinking, A. K.; Beck, J. L.; Salter-Townshend, M.

2024-05-14 animal behavior and cognition 10.1101/2024.05.10.593659 medRxiv

Top 0.1%

88.0%

Show abstract

Animal social network analysis using GPS telemetry datasets provides insights into group dynamics, social structure, and interactions of the animal communities. It aids conservation by characterizing key aspects of animal sociality - including spatially explicit information on where sociality occurs (e.g., habitats, migratory corridors), contributing to informed management strategies for wildlife populations. The aniSNA package provides functions to assess and leverage data collected by sampling a subset of an animal population to perform social network analysis. The methodologies offered in this package are compatible with a variety of location and grouping data, collected through various means (e.g., direct observations, biologgers), however, they are particularly well suited to autocorrelated data streams such as data collected through GPS telemetry radio collars. The techniques assess the datas suitability to extract reliable statistical inferences from social networks and compute uncertainty estimates around the network metrics in the scenario where a fraction of the population is monitored. The package functions are user-friendly and allow for the implementation of pre-network data permutations for auto-correlated data streams, sensitivity analysis under downsampling, bootstrapping to establish confidence intervals for global and node-level network metrics, and correlation and regression analysis to assess the robustness of node-level network metrics. Using this package, animal ecologists will be able to compute social network metrics, both at the population and individual level, assess their reliability, and use such metrics in further analyses, e.g., to study social network variation within and across populations or link individual sociality to life history. This software also has plotting features that allow for visual interpretation of the findings.

2

A natural history of networks: Higher-order network modeling for paleobiology research

Rojas, A.; Eriksson, A.; Neuman, M.; Edler, D.; Blocker, C.; Rosvall, M.

2022-09-27 paleontology 10.1101/2022.09.26.509538 medRxiv

Top 0.1%

77.6%

Show abstract

Paleobiologists are increasingly employing network-based methods to analyze the complex data retrieved from geohistorical records, including stratigraphic sections, sediments, and fossil collections. However, the lack of a common framework for designing, performing, evaluating, and communicating these studies, leads to issues of reproducibility and communicability. The high-dimensional geohistorical data also raises questions about the limitations of standard network approaches, which assume independent interactions between pairs of components. Higher-order network models better suited for the complex relational structure of the geohistorical data provide an opportunity to overcome these challenges. These models can represent temporal and spatial constraints inherent to the biosedimentary record and describe higher-order interactions, capturing more accurate biogeographical, biostratigraphic, and macroevolutionary patterns. Here we describe how to use the Map Equation framework for designing higher-order network models of geohistorical data, address some practical decisions involved in modeling complex dependencies, and discuss critical methodological and conceptual issues that currently make it difficult to compare results across studies in the growing body of network-based paleobiology research. We illustrate different higher-order network representations and models, including multilayers, hypergraphs, and varying Markov times models, using case studies on gradient analysis, bioregionalization, and macroevolution, and delineate future research directions for current challenges in the emerging field of network paleobiology.

3

Efficient Functions for Energetic Least-Cost Analysis over Land and Water in the lbmech package for R

Mejia Ramon, A. G.

2023-07-10 ecology 10.1101/2023.07.09.548254 medRxiv

Top 0.1%

73.0%

Show abstract

Current least-cost tools for behavioral scientists are computationally insufficient and lack methods to independently derive animal-specific energetic cost functions based on locational data. Moreover, topologically-valid approaches only measure the biomechanical cost of movement, when sometimes net metababolical expediture may be a more-important consideration. In this paper, I first derive generalizable cost functions for energetic expenditure for any arbitrary biped or quadruped, versus the currently-employed regressed functions with limited cross-taxa or behavioral mode applicability. I then describe the cost-distance functions of the R package lbmech, designed to (1) regress the parameters describing behavioral sensitivity to slope from GPS tracking data; and (2) calculate the costs of movement in terms of time, kinetic work, and net metabolic expenditure. Finally, I demonstrate the utility of this package using publicly-available GPS, ocean current velocity, and elevation data from the Balearic Islands of Spain. This approach greatly facilitates the collection of field-based data given the reliance on GPS data. Moreover, it allows us to identify and consider individual- and group-level effects and variances that may lead to differential movement-related fitness with evolutionary consequences.

4

Multi-animal behavioral tracking and environmental reconstruction using drones and computer vision in the wild

Koger, B.; Deshpande, A.; Kerby, J. T.; Graving, J. M.; Costelloe, B. R.; Couzin, I. D.

2022-07-02 animal behavior and cognition 10.1101/2022.06.30.498251 medRxiv

Top 0.1%

72.5%

Show abstract

O_LIMethods for collecting animal behavior data in natural environments, such as direct observation and bio-logging, are typically limited in spatiotemporal resolution, the number of animals that can be observed, and information about animals social and physical environments. C_LIO_LIVideo imagery can capture rich information about animals and their environments, but image-based approaches are often impractical due to the challenges of processing large and complex multi-image datasets and transforming resulting data, such as animals locations, into geographic coordinates. C_LIO_LIWe demonstrate a new system for studying behavior in the wild that uses drone-recorded videos and computer vision approaches to automatically track the location and body posture of free-roaming animals in georeferenced coordinates with high spatiotemporal resolution embedded in contemporaneous 3D landscape models of the surrounding area. C_LIO_LIWe provide two worked examples in which we apply this approach to videos of gelada monkeys and multiple species of group-living African ungulates. We demonstrate how to track multiple animals simultaneously, classify individuals by species and age-sex class, estimate individuals body postures (poses), and extract environmental features, including topography of the landscape and animal trails. C_LIO_LIBy quantifying animal movement and posture, while simultaneously reconstructing a detailed 3D model of the landscape, our approach opens the door to studying the sensory ecology and decision-making of animals within their natural physical and social environments. C_LI

5

vassi - verifiable, automated scoring of social interactions in animal groups

Nührenberg, P.; Bose, A.; Jordan, A.

2025-07-21 animal behavior and cognition 10.1101/2025.07.15.664909 medRxiv

Top 0.1%

71.4%

Show abstract

Behavioral biologists, from neuroscientists to ethologists, rely on observation and scoring of behavior. In the past decade, numerous methods have emerged to automate this scoring through machine learning approaches. Yet, these methods are typically specified towards laboratory settings with only two animals, or employed in cases with well-separated behavioral categories. Here, we introduce the vassi Python package, focusing on supervised classification of directed social interactions and cases in which continuous variation in behavior means categories are less distinct. Our package is broadly applicable across species and social settings, including single individuals, pairs and groups, and implements a validation tool to separate behavioral edge cases. vassi has comparable performance to existing approaches on a behavioral classification benchmark, the CALMS21 mouse resident-intruder dataset, and we demonstrate its applicability on a novel, more naturalistic and complex dataset of cichlid fish groups. Our approach highlights future challenges in extending supervised behavioral classification to more naturalistic settings, and offers a methodological framework to overcome these challenges. Lay Summaryvassi (verifiable, automated scoring of social interactions) is a flexible, Python-based framework for automated behavioral classification and its verification through interactive visualization. vassi enables researchers to quantify directed social interactions in animal groups in naturalistic settings, bridging the gap between traditional ethology and modern computational tools.

6

Distilling complex evolutionary histories with shiftPlot

Miller, E. T.; Martin, B. S.

2022-03-18 evolutionary biology 10.1101/2022.03.16.484646 medRxiv

Top 0.1%

69.9%

Show abstract

Phylogenies form the backbone of many modern comparative methods and are integral components of contemporary science communication. Recent years have seen drastic increases in both the size and complexity of phylogenetic data as computational resources and genetic/trait databases expand. Graphical representations of these massive phylogenetic datasets push against the limits of legibility, often veering closer to artwork than scientific figures optimized to communicate results. While attractive scientific illustrations are certainly a laudable goal, researchers may want to opt for simpler representations to communicate results more concisely. Here, we introduce a new R package, shiftPlot, which implements methods for simplifying and plotting phylogenetic comparative data on discrete traits. Specifically, shiftPlot automatically finds and collapses clades exhibiting the same character state, effectively creating smaller phylogenies that may be more legibly rendered on standard page sizes. Further, these visualizations more clearly communicate evolutionary dynamics by emphasizing state shifts over tip states. While there are undoubtedly situations where this graphical approach will not be suitable (e.g., continuous traits), we believe shiftPlot will prove useful for modern researchers faced with the task of communicating the results of complex phylogenetic analyses.

7

sabinaHSBM: An R package for link prediction network reconstruction using Hierarchical Stochastic Block Models

Lima, H.; Morales-Barbero, J.; Mateo, R. G.; Morales-Castilla, I.; Rodriguez, M. A.

2025-10-29 ecology 10.1101/2025.10.28.684773 medRxiv

Top 0.1%

69.6%

Show abstract

O_LINetwork analysis is a powerful framework for investigating complex systems across disciplines, including ecology, where it helps uncover patterns in predator-prey, host- parasite, or plant-pollinator interactions. However, ecological network data are often incomplete or error-prone due to sampling limitations, detection failures, and taxonomic uncertainty--leading to missing (false negative) and spurious (false positive) links that obscure structure and hinder inference. The hierarchical stochastic block model (HSBM), particularly in its degree-corrected form, is among the most effective tools for reconstructing networks under such uncertainty. Despite its robustness, the primary implementation of HSBM in the Python-based graph-tool library has remained largely inaccessible to ecologists. C_LIO_LIHere, we introduce sabinaHSBM, the first R package that makes degree-corrected HSBM broadly available through a user-friendly, flexible workflow. By bridging a gap between advanced network modeling and widely used ecological analysis platforms, sabinaHSBM facilitates network reconstruction and link prediction from binary bipartite data. The workflow involves three main steps: (1) preparing input data, (2) estimating posterior link probabilities, and (3) reconstructing the network. The package supports detection of undocumented and spurious links, exploration of hierarchical structure, and propagation of uncertainty throughout. Key features include cross-validation, flexible thresholding, probabilistic evaluation metrics, and two link prediction modes: estimating all link probabilities or identifying undocumented ones. C_LIO_LIWe illustrate the packages functionality through a case study using a published global dataset of carnivore-parasite associations, showing that inferred groupings are phylogenetically clustered. To assess predictive accuracy, we examined the top 10 highest-probability links identified by the model and found published evidence for 8, despite their absence from the original dataset. This highlights the models ability to recover biologically meaningful but underreported interactions. C_LIO_LIBy integrating all components of HSBM-based reconstruction into an accessible R package, sabinaHSBM empowers researchers to improve relational data quality and uncover overlooked patterns in complex ecological networks and beyond. C_LI

8

Recording animal-view videos of the natural world

Vasas, V.; Lowell, M. C.; Villa, J.; Jamison, Q. D.; Siegle, A. G.; Katta, P. K. R.; Bhagavathula, P.; Kevan, P. G.; Fulton, D.; Losin, N.; Kepplinger, D.; Salehian, S.; Forkner, R. E.; Hanley, D.

2022-11-23 animal behavior and cognition 10.1101/2022.11.22.517269 medRxiv

Top 0.1%

65.9%

Show abstract

Plants, animals, and fungi display a rich tapestry of colors. Animals, in particular, use colors in dynamic displays performed in spatially complex environments. In such natural settings, light is reflected or refracted from objects with complex shapes that cast shadows and generate highlights. In addition, the illuminating light changes continuously as viewers and targets move through heterogeneous, continually fluctuating, light conditions. Although traditional spectrophotometric approaches for studying colors are objective and repeatable, they fail to document this complexity. Worse, they miss the temporal variation of color signals entirely. Here, we introduce hardware and software that provide ecologists and filmmakers the ability to accurately record animal-perceived colors in motion. Specifically, our Python codes transform photos or videos into perceivable units (quantum catches) for any animal of known photoreceptor sensitivity. We provide the plans, codes, and validation tests necessary for end-users to capture animal-view videos. This approach will allow ecologists to investigate how animals use colors in dynamic behavioral displays, the ways natural illumination alters perceived colors, and other questions that remained unaddressed until now due to a lack of suitable tools. Finally, our pipeline provides scientists and filmmakers with a new, empirically grounded approach for depicting the perceptual worlds of non-human animals.

9

ipmr: Flexibly implement Integral Projection Models in R

Levin, S. C.; Childs, D. Z.; Compagnoni, A.; Evers, S.; Knight, T.; Salguero-Gomez, R.

2021-04-21 ecology 10.1101/2021.04.20.440590 medRxiv

Top 0.1%

65.0%

Show abstract

O_LIIntegral projection models (IPMs) are an important tool for studying the dynamics of populations structured by one or more continuous traits (e.g. size, height, color). Researchers use IPMs to investigate questions ranging from linking drivers to plant population dynamics, planning conservation and management strategies, and quantifying selective pressures in natural populations. The popularity of stage-structured population models has been supported by R scripts and packages (e.g. IPMpack, popbio, popdemo, lefko3) aimed at ecologists, which have introduced a broad repertoire of functionality and outputs. However, pressing ecological, evolutionary, and conservation biology topics require developing more complex IPMs, and considerably more expertise to implement them. Here, we introduce ipmr, a flexible R package for building, analyzing, and interpreting IPMs. C_LIO_LIThe ipmr framework relies on the mathematical notation of the models to express them in code format. Additionally, this package decouples the model parameterization step from the model implementation step. The latter point substantially increases ipmrs flexibility to model complex life cycles and demographic processes. C_LIO_LIipmr can handle a wide variety of models, including density dependence, discretely and continuously varying stochastic environments, and multiple continuous and/or discrete traits. ipmr can accommodate models with individuals cross-classified by age and size. Furthermore, the package provides methods for demographic analyses (e.g. asymptotic and stochastic growth rates) and visualization (e.g. kernel plotting). C_LIO_LIipmr is a flexible R package for integral projection models. The package substantially reduces the amount of time required to implement general IPMs. We also provide extensive documentation with six vignettes and help files, accessible from an R session and online. C_LI

10

BiMultiNetPlot: An R package for visualizing ecological bipartite multilayer networks

Li, H.-D.

2024-09-24 ecology 10.1101/2024.09.20.613870 medRxiv

Top 0.1%

64.3%

Show abstract

O_LIWith the increasing study of ecological multilayer bipartite networks, the visualization of these networks has become more important. However, tools for visualizing multilayer networks are still lacking. C_LIO_LII present BiMultiNetPlot, an R package designed for visualizing ecological bipartite multilayer networks. C_LIO_LII demonstrate how to use BiMultiNetPlot through a series of examples that represent the most common types of ecological multilayer networks. C_LIO_LIBiMultiNetPlot is an open-source, flexible package within the ggplot2 environment, helping ecologists better understand the multilayer nature of ecological networks. C_LI

11

The camtrapR R package: From data management to interactive ecological analysis of camera trap data

Niedballa, J.; Sollmann, R.; Wilting, A.

2025-09-29 ecology 10.1101/2025.09.26.678697 medRxiv

Top 0.1%

64.2%

Show abstract

O_LICamera trapping has become an indispensable tool in wildlife ecology, generating vast datasets that require efficient and robust analytical workflows. The R package camtrapR was originally developed for preparing and managing camera trap data for subsequent analysis in external modeling packages like unmarked. It has since become a standard tool in the field for this purpose. C_LIO_LIHere, we introduce a major update that transforms camtrapR from a data preparation tool into a comprehensive, end-to-end analytical platform. The centerpiece of this evolution is the surveyDashboard(), a novel code-free graphical user interface that guides users through the entire analysis pipeline, from data import to final predictions. This update also incorporates enhanced data import functionalities for major standards like Wildlife Insights and Camtrap DP, a complete workflow for fitting community occupancy models, and streamlined tools for environmental covariate extraction. C_LIO_LIThe interactive dashboard provides an integrated environment for the entire analytical process. Users can perform essential exploratory analyses, such as generating species accumulation curves and mapping species detections, before proceeding to model fitting. The interface supports the interactive construction of both single-species and multi-species (community) occupancy models. The dashboards covariate preparation tools generate inputs for both model fitting and for creating spatial predictions of species occupancy. C_LIO_LIFurthermore, the update introduces a comprehensive workflow for fitting Bayesian community occupancy models using JAGS or NIMBLE. This allows for hierarchical modeling of species- and community-level responses to environmental drivers, providing deeper insights into wildlife communities. The workflow includes tools for model assessment, such as convergence diagnostics and posterior predictive checks for goodness-of-fit. C_LIO_LIBy integrating a powerful, code-free interface with advanced backend modeling functions, this major update to camtrapR aims to make robust and reproducible camera trap data analysis accessible to a wider audience, including ecologists, wildlife managers, and students. This paper serves as the new definitive reference for the expanded functionality of camtrapR as a comprehensive tool for modern camera trap studies. C_LI

12

rarestR: An R package using rarefaction metrics to estimate α-diversity (species richness) and β-diversity (species shared) for incomplete samples

Zou, Y.; Zhao, P.; Wu, N.; Lai, J.; Peres-Neto, P. R.; Axmacher, J. C.

2024-04-30 ecology 10.1101/2024.04.29.591713 medRxiv

Top 0.1%

63.3%

Show abstract

Species abundance data is commonly used to study biodiversity patterns. In this context, estimating - and {beta}-diversity based on incomplete samples can lead to undersampling biases. It is therefore essential to employ methods that enable accurate comparisons of - and {beta}-diversity across varying sample sizes. This involves relying on biodiversity measures that are focused on accurately estimating the total number of species within a community, as well as the total number of species shared by two communities. Rarefaction offers such a method, where -diversity is estimated for standardized sample sizes. Rarefaction methods can also be used as a basis for {beta}-diversity calculations for standardized sample sizes. In this application note, we introduce a new R package, rarestR, designed to estimate abundance-based - and {beta}-diversity measures for inconsistent samples using rarefaction metrics. Additionally, the package offers parametric extrapolations to estimate the total expected number of species within a single community and the total expected number of species shared between two communities. Furthermore, it provides visualization for the curve fitting associated with these estimators. Overall, the rarestR package is useful in estimating - and {beta}-diversity values for incomplete samples, for example in studies involving highly mobile or species-rich taxa. These species estimators offer a complementary approach to non-parametric methods, such as the Chao series of estimators.

13

Behavioural state inference from movement and environmental data using Markovian step selection functions

Bouderbala, I.; Nicosia, A.; Fortin, D.

2026-02-07 animal behavior and cognition 10.64898/2026.02.05.704063 medRxiv

Top 0.1%

59.8%

Show abstract

O_LIMovement paths reflect temporal shifts in behavioural states, typically driven by internal and external drivers. However, the inherently multiphasic nature of these trajectories is frequently overlooked in empirical studies, an oversight that can hinder progress in our understanding of movement ecology. While Hidden Markov Models (HMMs) can successfully identify latent states--such as foraging or travelling--they face significant challenges, particularly in determining the appropriate number of states and in interpreting their ecological relevance in the context of both movement patterns and environmental covariates. C_LIO_LIWe present a framework based on Hidden Markov Models with Step Selection Functions (HMM-SSFs) that identifies behavioural states, represented by ecologically meaningful labels linked to explicit hypotheses about animal movement, that best explain observed movement patterns. The framework imposes interpretable conditions and diagnostic criteria on the post-identified behavioural states to ensure ecological coherence. It is grounded in the evaluation of biologically motivated scenarios rather than purely data-driven partitioning. The framework proceeds in two main steps: first, movement-based states are identified using movement-derived covariates only; second, these states are refined by incorporating environmental predictors, such as habitat structure or species interactions (e.g., predator-prey dynamics). This sequential integration enables the detection of ecological responses that are conditional on behavioural context. C_LIO_LISimulations show that the framework effectively recovers behavioural states across most conditions. State decoding accuracy was notably higher when control locations were drawn from an exponential-family distribution, compared to a uniform one. The exponential-family approach improved state separation and reduced mislabelling, especially when few control locations are generated. However, low state persistence--particularly in Encamped behaviours--resulted in an overestimation of the number of states. These findings underscore the influence of transition probabilities on behavioural labelling. Finally, we applied our framework to zebra (Equus quagga) movement data by combining movement predictors with changes in direction toward the nearest preferred habitat. This enabled us to distinguish between habitat-dependent and habitat-independent travelling behaviours, as well as to identify spatially finer-scale such as encamped state. C_LIO_LIThe proposed framework balances complexity and biological interpretability by using basic movement metrics to identify the behavioural states and their sequence that best explain multiphasic movement paths, together with environmental factors directing movement in each state. Unlike traditional methods that predefine the number of states, the framework estimates both state number and labels, offering a flexible and ecologically meaningful approach for behavioural inference. C_LI

14

From raw footage to behaviour: an automated pipeline for activity monitoring and behaviour identification in the wild

Silva, L. R.; Ferreira, A. C.; Doutrelant, C.; Covas, R.

2024-10-25 animal behavior and cognition 10.1101/2024.10.25.620052 medRxiv

Top 0.1%

59.4%

Show abstract

O_LIStudies of animal behaviour usually rely on direct observations or manual annotations of video recordings. However, such methods can be very time-consuming and error-prone, leading to sub-optimal sample sizes. Recent advances in deep-learning show great potential to overcome such limitations, nevertheless, most currently available behavioural recognition solutions remain focused on captivity settings. C_LIO_LIHere, we present a deployment-focused framework to guide researchers in building behavioural recognition systems from video data, using Long Short-Term Memory (LSTM) networks to classify behavioural sequences across consecutive frames. C_LIO_LILSTMs allowed to: 1) monitor nest activity by detecting the birds presence and simultaneously classifying the type of trajectory: i.e., nest-chamber entrance or exit; and 2) identify the behaviour performed: building, aggression or sanitation. Using our framework, we outperformed human annotators when jointly considering error and speed. Model performance improved with challenging training instances, and remained robust even with modest sample sizes. LSTM also outperformed YOLO ("You Only Look Once"), highlighting the critical role of temporal sequence information in behavioural analysis. C_LIO_LIWe demonstrate that our approach is replicable across three bird species and applicable to deployment videos, highlighting its value as a generalizable and transferable tool for long-term studies in the wild. C_LI DATA AVAILABILITYScripts, models, and data required to reproduce this work are available on Zenodo (DOIs: 10.5281/zenodo.18681623 and 10.5281/zenodo.18695178).

15

Step selection analysis with non-linear and random effects in mgcv

Klappstein, N. J.; Michelot, T.; Fieberg, J.; Pedersen, E. J.; Field, C.; Mills Flemming, J.

2024-01-10 ecology 10.1101/2024.01.05.574363 medRxiv

Top 0.1%

59.2%

Show abstract

Step selection analysis is used to jointly describe animal movement patterns and habitat preferences. Recent work has extended this framework to model inter-individual differences, account for unexplained structure in animals space use, and capture temporally-varying patterns of movement and habitat selection.In this paper, we formulate step selection functions with penalised smooths (similar to generalised additive models) to unify new and existing extensions, and conveniently implement the models in the popular, open-source mgcv R package. We explore non-linear patterns of movement and habitat selection, and use the equivalence between penalised smoothing splines and random effects to implement individual-level and spatial random effects. This framework can also be used to fit varying-coefficient models to account for temporally or spatially-heterogeneous patterns of selection (e.g., resulting from behavioural variation), or any other non-linear interactions between drivers of the animals movement decisions. We provide the necessary technical details to understand several key special cases of smooths and their implementation in mgcv, showcase the ecological relevance using two illustrative examples, and provide R code (available at https://github.com/NJKlappstein/smoothSSF) to facilitate the adoption of these methods. This paper is a broad overview of how smooth effects can be applied to increase the flexibility and biological realism of step selection analysis.

16

Improving the integration of AI into existing ecological inference workflows

Cowans, A.; Lambin, X.; Hare, D.; Sutherland, C.

2024-12-03 ecology 10.1101/2024.11.27.625677 medRxiv

Top 0.1%

58.9%

Show abstract

O_LIArtificial Intelligence (AI) has revolutionised the process of identifying species and individuals in audio recordings and camera trap images. However, despite developments in sensor technology, machine learning, and statistical methods, a general AI-assisted data-to-inference pipeline has yet to emerge. C_LIO_LIWe argue that this is, in part, due to a lack of clarity around several decisions in existing workflows, including: the choice of classifier used (e.g., semi- vs. fully automated); how classifier confidence scores are used and interpreted; and the availability and selection of appropriate statistical methods for drawing ecological inferences. C_LIO_LIHere, we attempt to conceptualise a general workflow associated with automated tools in ecology. We motivate this perspective using our experiences with occupancy modelling using monitoring data collected through passive acoustic monitoring and camera trapping, identifying priority areas for future developments. C_LIO_LIWe offer an accessible guide to support the ecological community in navigating and capitalising on rapid technological and methodological advances. We describe how different error types arise from both sensor-based monitoring and from classifiers themselves; how different error types are handled at each stage of the workflow; and finally, implications and opportunities associated with deciding on methods used at each step of the pipeline. C_LIO_LIWe recommend that "black box" tools like neural network classification algorithms should be embraced in ecology, but widespread uptake requires more formal integration of AI into the existing ecological inference workflows. Like ecological AI more broadly, however, successful development of new data-to-inference pipelines is a multidisciplinary endeavour that requires input from everyone invested in collecting, processing, analysing, and using ecological monitoring data. C_LI

17

tidysdm: leveraging the flexibility of tidymodels for Species Distribution Modelling in R

Leonardi, M.; Colucci, M.; Manica, A.

2023-11-22 ecology 10.1101/2023.07.24.550358 medRxiv

Top 0.1%

58.8%

Show abstract

In species distribution modelling (SDM), it is common practice to explore multiple machine-learning algorithms and combine their results into ensembles. This is no easy task in R: different algorithms were developed independently, with inconsistent syntax and data structures. Specialised SDM packages integrate multiple algorithms by creating a complex interface between the user (providing a unified input and receiving a unified output), and the back-end code (that tackles the specific needs depending on the algorithm). This requires a lot of work to create and maintain the right interface, and it prevents an easy integration of other methods that may become available. Here we present tidysdm, an R package that solves this problem by taking advantage of the tidymodels universe. Being part of the tidyverse, (i) it has standardised grammar and data structures providing a coherent interface for modelling, (ii) includes packages designed for fitting, tuning, and validating various models, and (iii) allows easy integration of new algorithms and methods. tidysdm allows easy, flexible and quick species distribution modelling by supporting standard algorithms, including additional SDM-oriented functions, and giving the opportunity of using any algorithm or procedure to fit, tune and validate a large number of different models. Additionally, it provides further functions to easily fit models based on paleo/time-scattered data. The package includes two vignettes detailing standard procedures for present-day and time-scattered data. These vignettes also showcase the integration with pastclim (Leonardi et al. 2023) to allow easier access to palaeoclimatic data series, if needed, but users can bring in their own climatic data in standard formats.

18

rTPC and nls.multstart: a new pipeline to fit thermal performance curves in R.

Padfield, D.; O'Sullivan, H.; Pawar, S.

2020-12-16 ecology 10.1101/2020.12.16.423089 medRxiv

Top 0.1%

58.8%

Show abstract

O_LIThe quantification of thermal performance curves (TPCs) for biological rates has many applications to problems such as predicting species responses to climate change. There is currently no widely used open-source pipeline to fit mathematical TPC models to data, which limits the transparency and reproducibility of the curve fitting process underlying applications of TPCs. C_LIO_LIWe present a new pipeline in R that currently allows for reproducible fitting of 24 different TPC models using non-linear least squares (NLLS) regression. The pipeline consists of two packages - rTPC and nls. multstart - that allow multiple start values for NLLS fitting and provides helper functions for setting start parameters. This pipeline overcomes previous problems that have made NLLS fitting and estimation of key parameters difficult or unreliable. C_LIO_LIWe demonstrate how rTPC and nls.multstart can be combined with other packages in R to robustly and reproducibly fit multiple models to multiple TPC datasets at once. In addition, we show how model selection or averaging, weighted model fitting, and bootstrapping can easily be implemented within the pipeline. C_LIO_LIThis new pipeline provides a flexible and reproducible approach that makes the challenging task of fitting multiple TPC models to data accessible to a wide range of users. C_LI

19

TAUS: Target-Age Unified Survival. Survival analysis without assuming proportional hazards or parameterising the survival function.

Casas Gomez-Uribarri, I.; Babayan, S. A.; Okumu, F.; Baldini, F.; Betancourth, M. P.

2026-04-17 ecology 10.64898/2026.04.15.718114 medRxiv

Top 0.1%

58.7%

Show abstract

O_LIStandard survival analysis methods often rely on the assumption of proportional hazards (PH) or parameterisations of the survival function that might not be appropriate for wild populations. C_LIO_LITo enable survival analysis without these modelling constraints, we developed an approach that combines the Kaplan-Meier estimator with conditional probability theory to compute age-specific probabilities of survival up to some target age of choice{tau} . Marginalising this probability over the age distribution of the population yields O{tau}, the probability that a randomly sampled individual of unknown age will outlive the target age{tau} . Notably, the value for{tau} is set by the analyst for each group independently, which allows accounting for differences in pace of life across populations. C_LIO_LIWe tested its application using a simulation study and two real-world datasets, and compared its performance against that of Cox PH and parametric survival models. The PH assumption was violated in the three examples, rendering the Cox PH models inappropriate. Parametric models offered a better alternative, but the best parametric fit missed at least some key survival patterns in all examples. The TAUS model provided a valid description of survival patterns in all cases. Its richer output also allowed finer analysis of survival differences between populations. C_LIO_LIThe TAUS model is also available as an R package (https://github.com/casasgomezuribarri/TAUS). This new approach to survival analysis without PH or parametric assumptions allows the comparison of survival probabilities across populations with different age structures and rates of pace of life. This makes it suitable for a wide range of ecological applications, including in population viability analysis, epidemiology, or life-history theory C_LI

20

Temporal disaggregation through interval-integrated B-splines for the integrated analysis of trapping counts in ecology

Fajgenblat, M.; Neyens, T.

2025-08-11 ecology 10.1101/2025.08.07.669113 medRxiv

Top 0.1%

58.4%

Show abstract

O_LIPassive trapping techniques such as pitfall and malaise traps probably constitute the most widely used methods for standardised surveys of invertebrate populations worldwide. These methods typically yield aggregated count data over multi-day trapping periods, often spanning several weeks, during which species activity (i.e. phenology) can vary. The analysis of trapping data collected over temporally misaligned sampling intervals is challenging, hampering the integrated analysis of historically available trapping datasets. C_LIO_LIWe introduce a temporal disaggregation approach using interval-integrated B-splines to analyse data collected over misaligned sampling intervals while accounting for phenological influences. We present computationally efficient Taylor series approximations for integrating exponentiated B-splines over sampling intervals. We further tailor our approach to typical trapping datasets by providing several extensions, including joint species distribution modelling. C_LIO_LIThrough simulations and cross-validation, we demonstrate that our approach of temporal disaggregation outperforms naive approaches and provides improved inference on phenology and other parameters of interest, such as inter-annual trends. The first-order Taylor approximation, which can be fit using regular software routines, properly accounts for heterogeneity in sampling duration and timing, while the second-order Taylor approximation and the exact model additionally allow for improved estimation of phenological patterns. C_LIO_LIBy applying this model to a large pitfall trapping dataset, spanning almost 50 years and over 10,000 trapping events in the Belgian province of Limburg, we illustrate how this approach can be used to reveal phenological, spatiotemporal and co-distributional patterns for 331 spider species. C_LIO_LIThe interval-integrated B-splines approach we present provides a convenient way to infer phenology and other ecological parameters from temporally aggregated count data obtained over misaligned sampling intervals, facilitating the integrated analysis of heterogeneously collected datasets to infer biodiversity trends. C_LI